Goto

Collaborating Authors

 random fourier feature


Interpretable and Parameter Efficient Graph Neural Additive Models with Random Fourier Features

Neural Information Processing Systems

Graph Neural Networks (GNNs)excel at jointly modeling node features and topology, yet their black-box nature limits their adoption in real-world applications where interpretability is desired. Inspired by the success of interpretable Neural Additive Models (NAM)for tabular data, Graph Neural Additive Network (GNAN) extends the additive modeling approach to graph data to overcome limitations of GNNs. While being interpretable, GNANrepresentation learning overlooks the importance of local aggregation and more importantly suffers from parameter complexity. To mitigate the above challenges, we introduce Graph Neural Additive Model with Random Fourier Features (G-NAMRFF), a lightweight, self-interpretable graph additive architecture. G-NAMRFF represents each node embedding as the sum of feature-wise contributions where contributions are modeled via a Gaussian process (GP)with a graph-and feature-aware kernel. Specifically, we construct a kernel using Radial Basis Function (RBF) with graph structure induced by Laplacian and learnable Finite Impulse Response (FIR) filter. We approximate the kernel using Random Fourier Features (RFFs) which transforms the GPprior to a Bayesian formulation, which are subsequently learnt using a single layer neural network with size equal to number of RFF features. G-NAMRFF is light weight with 168 fewer parameters compared to GNAN. Despite its compact size, G-NAMRFFmatches or outperforms state-of-the-art GNNs and GNAN on node and graph classification tasks, delivering real-time interpretability without sacrificing accuracy 1.


Interpretable and Parameter Efficient Graph Neural Additive Models with Random Fourier Features

Neural Information Processing Systems

Graph Neural Networks (GNNs) excel at jointly modeling node features and topology, yet their black-box nature limits their adoption in real-world applications where interpretability is desired. Inspired by the success of interpretable Neural Additive Models (NAM) for tabular data, Graph Neural Additive Network (GNAN) extends the additive modeling approach to graph data to overcome limitations of GNNs. While being interpretable, GNAN representation learning overlooks the importance of local aggregation and more importantly suffers from parameter complexity. To mitigate the above challenges, we introduce Graph Neural Additive Model with Random Fourier Features (G-NAMRFF), a lightweight, self interpretable graph additive architecture. G-NAMRFF represents each node embedding as the sum of feature wise contributions where contributions are modeled via a Gaussian process (GP) with a graph-and feature-aware kernel. Specifically, we construct a kernel using Radial Basis Function (RBF) with graph structure induced by Laplacian and learnable Finite Impulse Response (FIR) filter. We approximate the kernel using Random Fourier Features (RFFs) which transforms the GP prior to a Bayesian formulation, which are subsequently learnt using a single layer neural network with size equal to number of RFF features. G-NAMRFF is light weight with $168\times$ fewer parameters compared to GNAN. Despite its compact size, G-NAMRFF matches or outperforms state-of-the-art GNNs and GNAN on node and graph classification tasks, delivering real-time interpretability without sacrificing accuracy.








Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non-stationary Time-series

arXiv.org Machine Learning

Overparameterized models have recently challenged conventional learning theory by exhibiting improved generalization beyond the interpolation limit, a phenomenon known as benign overfitting. This work introduces Adaptive Benign Overfitting (ABO), extending the recursive least-squares (RLS) framework to this regime through a numerically stable formulation based on orthogonal-triangular updates. A QR-based exponentially weighted RLS (QR-EWRLS) algorithm is introduced, combining random Fourier feature mappings with forgetting-factor regularization to enable online adaptation under non-stationary conditions. The orthogonal decomposition prevents the numerical divergence associated with covariance-form RLS while retaining adaptability to evolving data distributions. Experiments on nonlinear synthetic time series confirm that the proposed approach maintains bounded residuals and stable condition numbers while reproducing the double-descent behavior characteristic of overparameterized models. Applications to forecasting foreign exchange and electricity demand show that ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent. The results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework.


Sampled Softmax with Random Fourier Features

Neural Information Processing Systems

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier Features to enable more efficient and accurate sampling from an approximate softmax distribution. We show that RF-softmax leads to low bias in estimation in terms of both the full softmax distribution and the full softmax gradient.